The data set that we chose comes from the IES National Center for Education Statistics. The specific study is titled Education Longitudinal Study of 2002 or ELS:2002. ELS:2002 represents a major longitudinal effort designed to provide trend data about critical transitions experienced by students as they proceed through high school and into postsecondary education. The 2002 sophomore cohort was followed, initially at 2-year intervals, to collect policy-relevant data about educational processes and outcomes. These data focus on areas of student learning, student tragectories, student persistence and access to college, as well as entry into the workforce. The baseline year for the study was Spring term 2002. A national sample of high school sophomores were surveyed along with their parents, teachers, adminstrators, and librarians. The first follow-up took place two years later in 2004. Students who remained in the same school for both 2002 and 2004 were resurveyed and tested in multiple areas including mathmatics. Students that did not attend the same school for any reason (i.e. transferred, dropped out, early graduation) were adminstered a questionnaire. Demographic information was collected for both groups along with academic scores.
Do the teacher’s reported student math scores across the two timepoints (i.e., base year and first follow-up) correlate with 1) students’ sex, 2) students’ race/ethnicity, and 3) mother’s highest education level of students?
1 = “Male”
2 = “Female”
-4 = “Nonrespondent”
-8 = “Survey component legitimate skip/NA”
1 = “Amer. Indian/Alaska Native, non-Hispanic”
2 = “Asian, Hawaii/Pac. Islander,non-Hispanic”
3 = “Black or African American, non-Hispanic”
4 = “Hispanic, no race specified”
5 = “Hispanic, race specified”
6 = “More than one race, non-Hispanic”
7 = “White, non-Hispanic”
-4 = “Nonrespondent”
-8 = “Survey component legitimate skip/NA”
1 = “Did not finish high school”
2 = “Graduated from high school or GED”
3 = “Attended 2-year school, no degree”
4 = “Graduated from 2-year school”
5 = “Attended college, no 4-year degree”
6 = “Graduated from college”
7 = “Completed Master’s degree or equivalent”
8 = “Completed PhD, MD, other advanced degree”
-4 = “Nonrespondent”
-8 = “Survey component legitimate skip/NA”
-9 = “Missing”
Description: Math standardized T Score. The standardized T score provides a norm-referenced measurement of achievement, that is, an estimate of achievement relative to the population (spring 2002 10th-graders) as a whole. It provides information on status compared to peers (as distinguished from the IRT-estimated number-right score which represents status with respect to achievement on a particular criterion set of test items). The standardized T score is a transformation of the IRT theta (ability) estimate, rescaled to a mean of 50 and standard deviation of 10.
Description: Math standardized T Score. The standardized T score provides a norm-referenced measurement of achievement, that is, an estimate of achievement relative to the population (spring 2004 12th-graders) as a whole. It provides information on status compared with peers (as distinguished from the IRT-estimated number-right score which represents status with respect to achievement on a particular criterion set of test items). Although the T score is reported for all F1 in-school responding students (including transfer students), regardless of grade level, the comparison group for standardizing is the 12th grade population. The standardized T score is a transformation of the IRT theta (ability) estimate, and has a mean of 50 and standard deviation of 10 for the weighted subset of 12th-graders in the sample.
#retrieve data
#els <- read_csv("./data/els_02_12_byf3pststu_v1_0.csv")
#select columns
#els <- els %>% dplyr::select(STU_ID, BYSEX, BYRACE, BYMOTHED, BYTXMSTD, F1TXMSTD)
#save the revised (cleaned) data to csv
#write.csv(els,"./data/els_cleaned.csv", row.names = FALSE)
els <- read_csv("./data/els_cleaned.csv")
#replace missing data code to NA
els$BYSEX <- na_if(els$BYSEX, -4)
els$BYSEX <- na_if(els$BYSEX, -8)
els$BYRACE <- na_if(els$BYRACE, -4)
els$BYRACE <- na_if(els$BYRACE, -8)
els$BYMOTHED <- na_if(els$BYMOTHED, -4)
els$BYMOTHED <- na_if(els$BYMOTHED, -8)
els$BYMOTHED <- na_if(els$BYMOTHED, -9)
els$BYTXMSTD <- na_if(els$BYTXMSTD, -8)
els$F1TXMSTD <- na_if(els$F1TXMSTD, -8)
#remove if the row doesn't have both BY and F1 math scores
els <- els %>%
filter(!is.na(BYTXMSTD) | !is.na(F1TXMSTD))
#rename
els <- els %>%
mutate(BYSEX = dplyr::recode(BYSEX,
`1` = "Male",
`2` = "Female"),
BYRACE = dplyr::recode(BYRACE,
`1` = "Native American/Alaskan",
`2` = "Asian",
`3` = "Black",
`4` = "Hispanic (no race specified)",
`5` = "Hispanic (specified)",
`6` = "More than one race, non-Hispanic",
`7` = "White, non-Hispanic"),
BYMOTHED = dplyr::recode(BYMOTHED,
`1` = "Did not finish high school",
`2` = "Graduated from high school or GED",
`3` = "Attended 2-year school, no degree",
`4` = "Graduated from 2-year school",
`5` = "Attended college, no 4-year degree",
`6` = "Graduated from college",
`7` = "Completed Master's degree or equivalent",
`8` = "Completed PhD, MD, other advanced degree"))
#rename columns to use pivot_longer
colnames(els)[colnames(els) %in% c("BYTXMSTD", "F1TXMSTD")] <- c("Base", "Follow-up")
els_longer <- els %>%
pivot_longer(
cols = c('Base', 'Follow-up'),
names_to = "YEAR",
values_to = "MATH"
)
els_wider_by <- els %>%
pivot_wider(
id_cols = !'Follow-up',
names_from = BYRACE,
values_from = c(Base)
)
els_wider_f1 <- els %>%
pivot_wider(
id_cols = !Base,
names_from = BYRACE,
values_from = c('Follow-up')
)
vis1Data <- els_longer %>%
mutate(YEAR = factor(YEAR,
levels = c("Follow-up",
"Base"))) %>%
filter(!is.na(BYSEX)) %>%
ggplot(aes(x=MATH,y=YEAR,fill=YEAR)) +
geom_col(position="dodge", show.legend = FALSE) +
facet_wrap(~ BYSEX,ncol=1) +
labs(x="Math Scores",
y="Year",
title="Student Math Scores",
subtitle="by year and sex"
) +
scale_fill_manual(values = c("maroon", "gold")) +
theme_light()
vis1Data
vis2Data <- els_longer %>%
mutate(YEAR = factor(YEAR,
levels = c("Follow-up",
"Base"))) %>%
filter(!is.na(BYRACE)) %>%
ggplot(aes(x=MATH,y=YEAR,fill=YEAR)) +
geom_col(position="dodge", show.legend = FALSE) +
facet_wrap(~ BYRACE,ncol=1) +
labs(x="Math Scores",
y="Year",
title="Student Math Scores",
subtitle="by year and race"
) +
scale_fill_manual(values = c("maroon", "gold")) +
theme_light()
vis2Data
# Alternate graph combining Visualization 1 & 2? Maybe easier that way?
vis2DataAlternate <- els_longer %>%
mutate(YEAR = factor(YEAR,
levels = c("Follow-up",
"Base"))) %>%
mutate(BYSEX = factor(BYSEX,
levels = c("Male",
"Female"))) %>%
filter(!is.na(BYRACE)) %>%
ggplot(aes(x=MATH,y=YEAR,fill=BYSEX)) +
geom_col(position="dodge") +
facet_wrap(~ BYRACE,ncol=1) +
scale_fill_discrete(breaks=c('Male', 'Female')) +
labs(x="Math Scores",
y="Year",
fill = "Sex",
title="Student Math Scores",
subtitle="by year and race, separated by sex"
) +
scale_fill_manual(values = c("maroon", "gold")) +
theme_light()
vis2DataAlternate
These are simple distribution plots by Race, Sex, and Mother’s education for year 1 and year 2 for standardized math scores.
# Fixed the names to look better.
els_viz <- els_longer %>%
mutate(RACE = dplyr::recode(BYRACE, "Native American/Alaskan" = "Native American\n /Alaskan",
"Asian" = "Asian",
"Black" = "Black",
"Hispanic (no race specified)" = "Hispanic",
"Hispanic (specified)" = "Hispanic\n (Race specified)",
"More than one race, non-Hispanic" = "2+ races\n non-Hispanic",
"White, non-Hispanic" = "White\n non-Hispanic"),
MOTHED = dplyr::recode(BYMOTHED,
"Did not finish high school" = "Did not finish\n high school",
"Graduated from high school or GED" = "Graduated high\n school or GED",
"Attended 2-year school, no degree" = "Attended 2-year school\n no degree",
"Graduated from 2-year school" = "Graduated 2-year\n school",
"Attended college, no 4-year degree" = "Attended college\n no degree",
"Graduated from college" = "Graduated college",
"Completed Master's degree or equivalent" = "Master's degree",
"Completed PhD, MD, other advanced degree" = "PhD, MD,other\nadvanced degree")) %>%
mutate(RACE = factor(RACE, levels = c("White\n non-Hispanic",
"Black",
"Hispanic",
"Hispanic\n (Race specified)",
"Asian",
"Native American\n /Alaskan",
"2+ races\n non-Hispanic")),
MOTHED = factor(MOTHED, levels = c("Did not finish\n high school",
"Graduated high\n school or GED",
"Attended 2-year school\n no degree",
"Graduated 2-year\n school",
"Attended college\n no degree",
"Graduated college",
"Master's degree",
"PhD, MD,other\nadvanced degree")))
# A plot of the distribution of math scores by race in year 1 and follow up
els_viz %>%
filter(!is.na(MATH) & !is.na(RACE)) %>%
ggplot(aes(x = MATH)) +
geom_histogram(col='black',fill='white')+
theme_minimal() +
xlab("Math Scores") +
xlim(10,90)+
facet_wrap( ~ RACE + YEAR, nrow = 2, ncol=7)+
theme(strip.background =element_rect(fill="white"))
# A plot of the distribution of math scores by sex in year 1 and follow up.
els_viz %>%
filter(!is.na(MATH) & !is.na(RACE) & !is.na(BYSEX)) %>%
ggplot(aes(x = MATH)) +
geom_histogram(col='black',fill='white')+
theme_minimal() +
xlab("Math Scores") +
xlim(10,90)+
facet_wrap( ~ BYSEX + YEAR, nrow = 1, ncol=4)+
theme(strip.background =element_rect(fill="white"))
# A Plot of distribution of math scores by mother's education for year 1 and follow up.
els_viz %>%
filter(!is.na(MATH) & !is.na(MOTHED)) %>%
ggplot(aes(x = MATH)) +
geom_histogram(col='black',fill='white')+
theme_minimal() +
xlab("Math Scores") +
xlim(10,90)+
facet_wrap( ~ MOTHED + YEAR, nrow = 2, ncol=8)+
theme(strip.background =element_rect(fill="white"))
# Honestly I don't love the visualization, so let's look at a box plot of the data.
# How about a density plot?
A boxplot of the standardized math scores by Race, Sex, and Mother’s education.
# Boxplot of math scores by Race and Sex separated by Year
els_viz %>%
filter(!is.na(MATH) & !is.na(RACE)) %>%
ggplot(aes(x= RACE, y=MATH)) +
geom_boxplot(aes(fill = RACE), show.legend = FALSE)+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_fill_viridis_d(option = 'plasma')+
theme_minimal()+
facet_wrap(~BYSEX)+
labs(x = "",
y = "Math Scores",
title = "Math Score by Race and Year",
subtitle = "Separated by sex")+
coord_flip()
# Boxplot of math scores by Mother's education separated by Year
els_viz %>%
filter(!is.na(MATH) & !is.na(MOTHED)) %>%
ggplot(aes(x= MOTHED, y=MATH)) +
geom_boxplot(aes(fill = MOTHED), show.legend = FALSE)+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
coord_flip()+
scale_fill_viridis_d(option = 'mako')+
theme_minimal()+
facet_wrap(~YEAR)+
labs(x = "",
y = "Math Scores",
title = "Math Score by Mother Education and Year")
# Boxplot of math scores by sex separated by year
els_viz %>%
filter(!is.na(MATH) & !is.na(BYSEX)) %>%
ggplot(aes(x= BYSEX, y=MATH)) +
geom_boxplot(aes(fill = BYSEX), show.legend = FALSE)+
scale_fill_viridis_d()+
theme_minimal()+
facet_wrap(~YEAR)+
labs(x = "",
y = "Math Scores",
title = "Math Score by Sex and Year")
Finally let’s examine the data using density plots.
# Density plot of math scores by Race and Sex.
els_viz %>%
filter(!is.na(MATH) & !is.na(RACE)) %>%
ggplot(aes(x = MATH, y = RACE))+
geom_density_ridges(aes(fill = RACE), alpha=0.5)+
scale_fill_viridis_d(option = 'plasma')+
theme_minimal()+
theme(legend.position = "none")+
facet_wrap(~BYSEX)
# Denisty plot of math scores by Mother education
els_viz %>%
filter(!is.na(MATH) & !is.na(MOTHED)) %>%
ggplot(aes(x = MATH, y = MOTHED))+
geom_density_ridges(aes(fill = MOTHED), alpha=0.5)+
scale_fill_viridis_d()+
theme_minimal()+
theme(legend.position = "none")+
labs(x = "Math Score",
y = "Mother's Education Level")
# Density plot of math scores by mother education separated by year
els_viz %>%
filter(!is.na(MATH) & !is.na(MOTHED)) %>%
ggplot(aes(x = MATH, y = MOTHED))+
geom_density_ridges(aes(fill = MOTHED), alpha=0.5)+
scale_fill_viridis_d()+
theme_minimal()+
theme(legend.position = "none")+
labs(x = "Math Score",
y = "Mother's Education Level")+
facet_wrap(~BYSEX, nrow = 1)
#Created table by Race
By_Race <- els_longer %>%
group_by(BYRACE) %>%
summarize(race_n = n(),
mean_math = mean(MATH, na.rm = TRUE),
sd_math = sd(MATH, na.rm = TRUE))
#Use describeBy to get kurtosis and skew
describeBy(els_longer$MATH, els_longer$BYRACE)
##
## Descriptive statistics by group
## group: Asian
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2727 54.04 10.77 54.13 54.17 11.53 19.82 86.68 66.86 -0.1 -0.45
## se
## X1 0.21
## ------------------------------------------------------------
## group: Black
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 3627 44.34 8.47 44.1 44.19 8.67 19.94 76.32 56.38 0.17 -0.15
## se
## X1 0.14
## ------------------------------------------------------------
## group: Hispanic (no race specified)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1776 45.73 9.14 45.45 45.58 9.67 20.53 76.92 56.39 0.17 -0.26
## se
## X1 0.22
## ------------------------------------------------------------
## group: Hispanic (specified)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2182 45.91 9.94 45.69 45.76 10.33 21.96 75.94 53.98 0.15 -0.36
## se
## X1 0.21
## ------------------------------------------------------------
## group: More than one race, non-Hispanic
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1311 50.49 9.68 50.6 50.73 9.71 22.33 80.59 58.26 -0.19 -0.13
## se
## X1 0.27
## ------------------------------------------------------------
## group: Native American/Alaskan
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 234 45.64 8.18 45.48 45.56 8.65 24.07 72.75 48.68 0.16 -0.07 0.53
## ------------------------------------------------------------
## group: White, non-Hispanic
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 16229 52.92 9.25 53.27 53.17 9.41 19.38 82.63 63.25 -0.24 -0.14
## se
## X1 0.07
By_Race %>%
kbl(caption = "Math Descriptives by Race",
digits = 2,
col.names = c("Reported Race", "n", "Mean ", "SD")) %>%
kable_classic()
| Reported Race | n | Mean | SD |
|---|---|---|---|
| Asian | 2920 | 54.04 | 10.77 |
| Black | 4040 | 44.34 | 8.47 |
| Hispanic (no race specified) | 1992 | 45.73 | 9.14 |
| Hispanic (specified) | 2442 | 45.91 | 9.94 |
| More than one race, non-Hispanic | 1470 | 50.49 | 9.68 |
| Native American/Alaskan | 260 | 45.64 | 8.18 |
| White, non-Hispanic | 17364 | 52.92 | 9.25 |
| NA | 1804 | 49.52 | 9.48 |
#table for Mothers Education
By_MotherED <- els_longer %>%
group_by(BYMOTHED) %>%
summarize(mothed_n = n(),
mean_math = mean(MATH, na.rm = TRUE),
sd_math = sd(MATH, na.rm = TRUE))
describeBy(els_longer$MATH, els_longer$BYMOTHED)
##
## Descriptive statistics by group
## group: Attended 2-year school, no degree
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 3418 49.62 9.36 49.99 49.78 9.64 19.94 76.34 56.4 -0.15 -0.3
## se
## X1 0.16
## ------------------------------------------------------------
## group: Attended college, no 4-year degree
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2974 51.52 9.31 51.88 51.74 9.67 23.55 80.59 57.04 -0.19 -0.23
## se
## X1 0.17
## ------------------------------------------------------------
## group: Completed Master's degree or equivalent
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2019 56.87 9.4 57.49 57.28 9.44 23 80.21 57.21 -0.46 0.25 0.21
## ------------------------------------------------------------
## group: Completed PhD, MD, other advanced degree
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 583 55.64 11.11 56.69 56.24 11.08 22.05 78.65 56.6 -0.48 -0.13
## se
## X1 0.46
## ------------------------------------------------------------
## group: Did not finish high school
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 3343 44.63 9.39 44.24 44.39 9.56 20.34 80.02 59.68 0.25 -0.17
## se
## X1 0.16
## ------------------------------------------------------------
## group: Graduated from 2-year school
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 3014 51.08 9.31 51.25 51.23 9.42 21.54 86.68 65.14 -0.13 -0.19
## se
## X1 0.17
## ------------------------------------------------------------
## group: Graduated from college
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 5327 54.63 9.45 55.22 54.94 9.46 23.37 84.85 61.48 -0.28 -0.13
## se
## X1 0.13
## ------------------------------------------------------------
## group: Graduated from high school or GED
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 7460 48.69 9.44 48.74 48.69 9.84 19.38 84 64.62 0.01 -0.28 0.11
By_MotherED %>%
kbl(caption = "Math Descriptives by Mother's Education Level",
digits = 2,
col.names = c("Mother's Education Level", "n", "Mean ", "SD")) %>%
kable_classic()
| Mother’s Education Level | n | Mean | SD |
|---|---|---|---|
| Attended 2-year school, no degree | 3696 | 49.62 | 9.36 |
| Attended college, no 4-year degree | 3178 | 51.52 | 9.31 |
| Completed Master’s degree or equivalent | 2120 | 56.87 | 9.40 |
| Completed PhD, MD, other advanced degree | 622 | 55.64 | 11.11 |
| Did not finish high school | 3862 | 44.63 | 9.39 |
| Graduated from 2-year school | 3240 | 51.08 | 9.31 |
| Graduated from college | 5640 | 54.63 | 9.45 |
| Graduated from high school or GED | 8234 | 48.69 | 9.44 |
| NA | 1700 | 49.81 | 9.24 |
#table for year
By_Year <- els_longer %>%
group_by(YEAR) %>%
summarize(year_n = n(),
mean_math = mean(MATH, na.rm = TRUE),
sd_math = sd(MATH, na.rm = TRUE))
describeBy(els_longer$MATH, els_longer$YEAR)
##
## Descriptive statistics by group
## group: Base
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 15892 50.71 9.91 50.83 50.84 10.23 19.38 86.68 67.3 -0.1 -0.19
## se
## X1 0.08
## ------------------------------------------------------------
## group: Follow-up
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 13648 50.66 10.11 50.85 50.74 10.85 19.82 79.85 60.03 -0.07 -0.51
## se
## X1 0.09
By_Year %>%
kbl(caption = "Math Descriptives by Year",
digits = 2,
col.names = c("Year", "n", "Mean ", "SD")) %>%
kable_classic()
| Year | n | Mean | SD |
|---|---|---|---|
| Base | 16146 | 50.71 | 9.91 |
| Follow-up | 16146 | 50.66 | 10.11 |
#Table by sex
By_Sex <- els_longer %>%
group_by(BYSEX) %>%
summarize(sex_n = n(),
mean_math = mean(MATH, na.rm = TRUE),
sd_math = sd(MATH, na.rm = TRUE))
describeBy(els_longer$MATH, els_longer$BYSEX)
##
## Descriptive statistics by group
## group: Female
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 14203 50.11 9.64 50.37 50.23 10.2 19.82 84 64.18 -0.1 -0.37 0.08
## ------------------------------------------------------------
## group: Male
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 13966 51.35 10.41 51.42 51.48 10.85 19.38 86.68 67.3 -0.1 -0.37
## se
## X1 0.09
By_Sex %>%
kbl(caption = "Math Descriptives by Sex",
digits = 2,
col.names = c("Reported Sex", "n", "Mean ", "SD")) %>%
kable_classic()
| Reported Sex | n | Mean | SD |
|---|---|---|---|
| Female | 15400 | 50.11 | 9.64 |
| Male | 15254 | 51.35 | 10.41 |
| NA | 1638 | 49.96 | 9.09 |
library(car)
math_mod <- lm(MATH ~ 1 + MOTHED*RACE, data = els_viz)
Anova(math_mod, type = 3)
## Anova Table (Type III tests)
##
## Response: MATH
## Sum Sq Df F value Pr(>F)
## (Intercept) 1845001 1 23299.2150 < 2.2e-16 ***
## MOTHED 149753 7 270.1601 < 2.2e-16 ***
## RACE 27333 6 57.5283 < 2.2e-16 ***
## MOTHED:RACE 15813 42 4.7546 < 2.2e-16 ***
## Residuals 2219619 28030
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Too many comparisons. Instead we are going to look at one way anova comparisons
aovRaceMothed <- aov(MATH ~ RACE*MOTHED, data = els_viz)
tukey_all <- TukeyHSD(aovRaceMothed, conf.level = 0.95)
aovRace <- aov(MATH ~ RACE,data=els_viz)
tukey_race <- TukeyHSD(aovRace,conf.level = 0.95)
# Visualization of Tukeys HSD pairwise comparisons by race
plot(tukey_race, col = "brown")
# Let's look if Mother's education is a predictor. Did not finish high school is the reference group
#contrasts(els_viz$MOTHED)
aovMothed <- aov(MATH ~ MOTHED,data=els_viz)
tukey_mothed <- TukeyHSD(aovMothed,conf.level = 0.95)
# Visualization of Tukeys HSD pairwise comparisons by race
plot(tukey_mothed, col = "red")
# Let's look if Mother's education is a predictor. Did not finish high school is the reference group
#contrasts(els_viz$MOTHED)
#str(els_viz)
# Paired student's t test to examine if means from year 1 and year 2 are significantly different
# Created a small data set of just years and scores plus student id
els_byyear <- els_viz %>%
group_by(YEAR)%>%
filter(!is.na(MATH) | !is.na(YEAR)) %>%
select(STU_ID, YEAR, MATH)%>%
pivot_wider(names_from = YEAR,
values_from = MATH) %>%
rename("Follow" = "Follow-up")
# Paired student's t test to examine if the mean from the follow up is significantly greater than the base year mean
t.test(els_byyear$Follow, els_byyear$Base, paired = TRUE, alternative = "greater")
##
## Paired t-test
##
## data: els_byyear$Follow and els_byyear$Base
## t = -20.537, df = 13393, p-value = 1
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
## -0.9029677 Inf
## sample estimates:
## mean difference
## -0.8360042
# Paired student's t test to examine if means from base year and follow up are significantly different.
t.test(els_byyear$Base, els_byyear$Follow, paired = TRUE)
##
## Paired t-test
##
## data: els_byyear$Base and els_byyear$Follow
## t = 20.537, df = 13393, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 0.7562106 0.9157978
## sample estimates:
## mean difference
## 0.8360042
# Visualization of the means by year. The t test suggests that there is a significant difference in scores by year, despite the means being similar, unfortunately, the mean of the follow up is significantly less than the base year. I was thinking we could focus on visualizing the data by race and mother's education and sex rather than comparing year 1 to year 2 data.
ggwithinstats(data = els_viz, x = YEAR, y = MATH,
type = "parametric",
centrally.plotting = TRUE,
pairwise.display = "s",
point.path = FALSE,
point.args = aes(size = 0, alpha = 0.2),
results.subtitle = FALSE,
alternative = "greater")
# First let us look if Race is a predictor. I will set white as the reference group since it is the largest group.
#contrasts(els_viz$RACE)
# Looks like I set white as the reference earlier.
mod_race <- lm(MATH ~ 1 + RACE, els_viz)
tab_model(mod_race)
| MATH | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 52.92 | 52.78 – 53.07 | <0.001 |
| RACE [Black] | -8.59 | -8.93 – -8.25 | <0.001 |
| RACE [Hispanic] | -7.19 | -7.65 – -6.73 | <0.001 |
|
RACE [Hispanic (Race specified)] |
-7.02 | -7.43 – -6.60 | <0.001 |
| RACE [Asian] | 1.11 | 0.73 – 1.49 | <0.001 |
|
RACE [Native American /Alaskan] |
-7.28 | -8.49 – -6.07 | <0.001 |
|
RACE [2+ races non-Hispanic] |
-2.43 | -2.96 – -1.91 | <0.001 |
| Observations | 28086 | ||
| R2 / R2 adjusted | 0.127 / 0.126 | ||
# This just tells us that all groups are significantly different from each other
# I think we could report pairwise comparisons
race_pairs <- pairwise.t.test(els_viz$MATH, els_viz$RACE, p.adjust.method = "bonf")
race_pval <- race_pairs$p.value %>%
round(digits = 3)
options(knitr.kable.NA = "")
race_pval %>%
kbl(caption = "p -values of Math Score by Race",
digits = 3) %>%
kable_classic()
| White non-Hispanic | Black | Hispanic | Hispanic (Race specified) | Asian | Native American /Alaskan | |
|---|---|---|---|---|---|---|
| Black | 0 | |||||
| Hispanic | 0 | 0.000 | ||||
| Hispanic (Race specified) | 0 | 0.000 | 1 | |||
| Asian | 0 | 0.000 | 0 | 0 | ||
| Native American /Alaskan | 0 | 0.808 | 1 | 1 | 0 | |
| 2+ races non-Hispanic | 0 | 0.000 | 0 | 0 | 0 | 0 |
# Based on this, most groups are significantly different **except** for black/Native, Hispanic/Hispanic (Race specified), Native/Hispanic, Native/Hispanic (Race specified)
mod_mothed <- lm(MATH ~ 1 + MOTHED, els_viz)
tab_model(mod_mothed)
| MATH | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 44.63 | 44.31 – 44.95 | <0.001 |
|
MOTHED [Graduated high school or GED] |
4.07 | 3.68 – 4.45 | <0.001 |
|
MOTHED [Attended 2-year school no degree] |
4.99 | 4.54 – 5.44 | <0.001 |
|
MOTHED [Graduated 2-year school] |
6.45 | 5.98 – 6.91 | <0.001 |
|
MOTHED [Attended college no degree] |
6.89 | 6.43 – 7.36 | <0.001 |
|
MOTHED [Graduated college] |
10.01 | 9.60 – 10.41 | <0.001 |
| MOTHED [Master’s degree] | 12.25 | 11.72 – 12.77 | <0.001 |
|
MOTHED [PhD, MD,other advanced degree] |
11.01 | 10.18 – 11.84 | <0.001 |
| Observations | 28138 | ||
| R2 / R2 adjusted | 0.118 / 0.117 | ||
# This just tells us that groups are significantly different from each other
# I think we could report pairwise comparisons
mothed_pairs <- pairwise.t.test(els_viz$MATH, els_viz$MOTHED, p.adjust.method = "bonf")
mothed_pval <- mothed_pairs$p.value %>%
round(digits = 3)
options(knitr.kable.NA = "")
mothed_pval %>%
kbl(caption = "p -values of Math Score by Mother's Education level",
digits = 3) %>%
kable_classic()
| Did not finish high school | Graduated high school or GED | Attended 2-year school no degree | Graduated 2-year school | Attended college no degree | Graduated college | Master’s degree | |
|---|---|---|---|---|---|---|---|
| Graduated high school or GED | 0 | ||||||
| Attended 2-year school no degree | 0 | 0 | |||||
| Graduated 2-year school | 0 | 0 | 0 | ||||
| Attended college no degree | 0 | 0 | 0 | 1 | |||
| Graduated college | 0 | 0 | 0 | 0 | 0 | ||
| Master’s degree | 0 | 0 | 0 | 0 | 0 | 0.000 | |
| PhD, MD,other advanced degree | 0 | 0 | 0 | 0 | 0 | 0.409 | 0.151 |
I am not sure how to do a regression analysis of all three variables, but I can look at the regression analysis of math scores based on race/ethnicity and mother’s education.
mod_moth_race <- lm(MATH ~ 1 + MOTHED*RACE, data = els_viz)
#summary(mod_moth_race)
tab_model(mod_moth_race)
| MATH | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 46.40 | 45.80 – 46.99 | <0.001 |
|
MOTHED [Graduated high school or GED] |
3.77 | 3.12 – 4.42 | <0.001 |
|
MOTHED [Attended 2-year school no degree] |
5.48 | 4.77 – 6.18 | <0.001 |
|
MOTHED [Graduated 2-year school] |
6.37 | 5.65 – 7.09 | <0.001 |
|
MOTHED [Attended college no degree] |
7.28 | 6.55 – 8.01 | <0.001 |
|
MOTHED [Graduated college] |
9.47 | 8.80 – 10.13 | <0.001 |
| MOTHED [Master’s degree] | 11.57 | 10.81 – 12.32 | <0.001 |
|
MOTHED [PhD, MD,other advanced degree] |
12.11 | 10.99 – 13.22 | <0.001 |
| RACE [Black] | -5.37 | -6.37 – -4.36 | <0.001 |
| RACE [Hispanic] | -2.98 | -3.88 – -2.08 | <0.001 |
|
RACE [Hispanic (Race specified)] |
-4.74 | -5.67 – -3.81 | <0.001 |
| RACE [Asian] | 2.70 | 1.78 – 3.62 | <0.001 |
|
RACE [Native American /Alaskan] |
-2.66 | -5.71 – 0.39 | 0.087 |
|
RACE [2+ races non-Hispanic] |
-1.15 | -3.03 – 0.74 | 0.233 |
|
MOTHED [Graduated high school or GED] × RACE [Black] |
-2.16 | -3.34 – -0.99 | <0.001 |
|
MOTHED [Attended 2-year school no degree] × RACE [Black] |
-2.70 | -4.02 – -1.39 | <0.001 |
|
MOTHED [Graduated 2-year school] × RACE [Black] |
-2.71 | -4.06 – -1.35 | <0.001 |
|
MOTHED [Attended college no degree] × RACE [Black] |
-2.48 | -3.84 – -1.12 | <0.001 |
|
MOTHED [Graduated college] × RACE [Black] |
-3.00 | -4.30 – -1.70 | <0.001 |
|
MOTHED [Master’s degree] × RACE [Black] |
-1.76 | -3.53 – 0.01 | 0.052 |
|
MOTHED [PhD, MD,other advanced degree] × RACE [Black] |
-8.36 | -10.97 – -5.75 | <0.001 |
|
MOTHED [Graduated high school or GED] × RACE [Hispanic] |
-2.27 | -3.56 – -0.98 | 0.001 |
|
MOTHED [Attended 2-year school no degree] × RACE [Hispanic] |
-2.70 | -4.25 – -1.15 | 0.001 |
|
MOTHED [Graduated 2-year school] × RACE [Hispanic] |
-2.06 | -3.83 – -0.30 | 0.022 |
|
MOTHED [Attended college no degree] × RACE [Hispanic] |
-1.93 | -3.61 – -0.25 | 0.024 |
|
MOTHED [Graduated college] × RACE [Hispanic] |
-4.12 | -5.93 – -2.30 | <0.001 |
|
MOTHED [Master’s degree] × RACE [Hispanic] |
-0.98 | -3.72 – 1.75 | 0.481 |
|
MOTHED [PhD, MD,other advanced degree] × RACE [Hispanic] |
-2.82 | -6.36 – 0.73 | 0.119 |
|
MOTHED [Graduated high school or GED] × RACE [Hispanic (Race specified)] |
-0.01 | -1.22 – 1.21 | 0.991 |
|
MOTHED [Attended 2-year school no degree] × RACE [Hispanic (Race specified)] |
-0.54 | -2.04 – 0.97 | 0.484 |
|
MOTHED [Graduated 2-year school] × RACE [Hispanic (Race specified)] |
0.07 | -1.63 – 1.77 | 0.935 |
|
MOTHED [Attended college no degree] × RACE [Hispanic (Race specified)] |
-1.74 | -3.32 – -0.16 | 0.031 |
|
MOTHED [Graduated college] × RACE [Hispanic (Race specified)] |
-0.28 | -1.73 – 1.16 | 0.701 |
|
MOTHED [Master’s degree] × RACE [Hispanic (Race specified)] |
-2.37 | -4.39 – -0.36 | 0.021 |
|
MOTHED [PhD, MD,other advanced degree] × RACE [Hispanic (Race specified)] |
-2.86 | -6.13 – 0.41 | 0.087 |
|
MOTHED [Graduated high school or GED] × RACE [Asian] |
1.65 | 0.45 – 2.85 | 0.007 |
|
MOTHED [Attended 2-year school no degree] × RACE [Asian] |
-3.12 | -4.74 – -1.51 | <0.001 |
|
MOTHED [Graduated 2-year school] × RACE [Asian] |
-0.19 | -1.81 – 1.43 | 0.818 |
|
MOTHED [Attended college no degree] × RACE [Asian] |
-3.39 | -5.03 – -1.76 | <0.001 |
|
MOTHED [Graduated college] × RACE [Asian] |
-1.65 | -2.83 – -0.47 | 0.006 |
|
MOTHED [Master’s degree] × RACE [Asian] |
-1.67 | -3.26 – -0.08 | 0.040 |
|
MOTHED [PhD, MD,other advanced degree] × RACE [Asian] |
-4.36 | -6.66 – -2.05 | <0.001 |
|
MOTHED [Graduated high school or GED] × RACE [Native American /Alaskan] |
-1.15 | -4.83 – 2.53 | 0.540 |
|
MOTHED [Attended 2-year school no degree] × RACE [Native American /Alaskan] |
-7.58 | -12.71 – -2.44 | 0.004 |
|
MOTHED [Graduated 2-year school] × RACE [Native American /Alaskan] |
-5.15 | -10.04 – -0.26 | 0.039 |
|
MOTHED [Attended college no degree] × RACE [Native American /Alaskan] |
-6.51 | -10.67 – -2.36 | 0.002 |
|
MOTHED [Graduated college] × RACE [Native American /Alaskan] |
-5.71 | -9.94 – -1.49 | 0.008 |
|
MOTHED [Master’s degree] × RACE [Native American /Alaskan] |
-5.88 | -11.98 – 0.22 | 0.059 |
|
MOTHED [PhD, MD,other advanced degree] × RACE [Native American /Alaskan] |
7.61 | -5.13 – 20.35 | 0.242 |
|
MOTHED [Graduated high school or GED] × RACE [2+ races non-Hispanic] |
-0.98 | -3.09 – 1.14 | 0.364 |
|
MOTHED [Attended 2-year school no degree] × RACE [2+ races non-Hispanic] |
-3.18 | -5.54 – -0.82 | 0.008 |
|
MOTHED [Graduated 2-year school] × RACE [2+ races non-Hispanic] |
-0.07 | -2.53 – 2.39 | 0.958 |
|
MOTHED [Attended college no degree] × RACE [2+ races non-Hispanic] |
-0.23 | -2.55 – 2.09 | 0.846 |
|
MOTHED [Graduated college] × RACE [2+ races non-Hispanic] |
-0.56 | -2.76 – 1.64 | 0.620 |
|
MOTHED [Master’s degree] × RACE [2+ races non-Hispanic] |
-1.38 | -4.00 – 1.24 | 0.302 |
|
MOTHED [PhD, MD,other advanced degree] × RACE [2+ races non-Hispanic] |
-8.16 | -12.03 – -4.30 | <0.001 |
| Observations | 28086 | ||
| R2 / R2 adjusted | 0.214 / 0.213 | ||
Now let’s try to visualize this, it might get messy.
# Visualization of regression models for mother's education and race
#install.packages("interactions")
cat_plot(mod_moth_race, pred = MOTHED, modx = RACE, geom = "line", interval = FALSE,vary.lty = TRUE)
This allows for the error bars to be included but crowds the data and makes analyzing the data across groups more difficult.
cat_plot(mod_moth_race, pred = MOTHED, modx = RACE, geom = "bar")